Conversation
This commit introduces type aliases:
- `MerkleTreeNodeIndex` for `u64`
- `MerkleTreeLeafIndex` for `u64`
- `MerkleTreeHeight` for `u32`.
It also refactors `MerkleTree` to avoid using `usize` where-ever
possible, particularly in public functions. Instead, these types
aliases are used.
`MerkleTree` internally stores a `Vec<Digest>`, which induces two
limitations:
1. The index type is `usize`, which complicates matters when
targeting compilation on 32-bit machines (including 32-bit WASM).
2. This vector can store at most 2^25 nodes, whereas we would like
functions associated with Merkle trees to work for Merkle trees
that are far larger (without needing to store all the internal
nodes explicitly).
The second limitation persists. However, the first limitation is fixed
by this commit.
Also, public functions `nodes()` and `leafs()` now return iterators
instead of slices. As these functions were public, this change is
breaking.
Addresses #250.
Needed downstream, for compressing `RemovalRecord` lists.
Also: increase line count.
7a13836 to
185df8e
Compare
|
I'm not sure type aliases are the way to go; as aliases, they are not distinct types. If we really want to go that way, I suggest to use the newtype pattern instead. pub struct MerkleTreeNodeIndex(u64);
impl MerkleTreeNodeIndex {
const MAX: u64 = 1 << 63;
}
// similar for these two:
pub struct MerkleTreeLeafIndex(u64);
pub struct MerkleTreeHeight(u8);
// maybe implement std::ops::DerefI understand the primary goal is to enable compilation on 32-bit targets. I think this can be achieved in a simpler way: const MAX_NUM_NODES: usize = {
#[cfg(target_pointer_width = "16")]
{ compile_error!("target pointer width 16 is not supported") }
#[cfg(target_pointer_width = "32")]
{ 1 << 25 } // comment as to why
#[cfg(target_pointer_width = "64")]
{ 1 << 32 }
};However, I'm unsure we even want to have this constant. Its documentation states that it “[e]nforces that all compilation targets have a consistent Also note that the second limitation you outline, namely that a vector can store at most Footnotes
|
I am familiar with the difference between type aliases and newtypes. I do not see the link between type aliases not being distinct types and type aliases being a poor strategy in this particular case or in general. I think you might sneaking in objectives to the discussion that I did not have going in. Which is fair. The dialectic provoked by different objectives usually exposes superior architectures. But let's be wary of letting the perfect be the enemy of the good. I realize I was not explicit about my objectives going in, so let me correct that. My goals were:
Non-goals were:
In light of that statement of objectives, I'm not sure why newtypes would be preferable over type aliases.
Good point! My first thought after reading it stated so succinctly is, "let's kill it". Let me sleep on it to see if other thoughts follow up.
Thanks for the clarification. I'll change the code accordingly. |
|
I can't see any downside from removing the various |
jan-ferdinand
left a comment
There was a problem hiding this comment.
I think removing the various MAX_* constants is the way to go.
| pub fn authentication_structure_node_indices( | ||
| num_leafs: MerkleTreeLeafIndex, | ||
| leaf_indices: &[MerkleTreeLeafIndex], | ||
| ) -> Result<impl ExactSizeIterator<Item = MerkleTreeLeafIndex> + use<>> { |
There was a problem hiding this comment.
This is the reason I'm not a fan of creating aliases for simple types: the function name and the documentation imply MerkleTreeNodeIndex for the iterator's item type, yet it is MerkleTreeLeafIndex.
There was a problem hiding this comment.
Good catch! This should absolutely be MerkleTreeNodeIndex.
I understand the drawback of not having the compiler's help to identify inconsistencies. It leads to wrong aliases slipping through the cracks, like here. But if that's a compelling argument against using type aliases, then it's also a compelling argument against comments in the code describing what's going on. Do you have a similar policy on comments (not including doctests or docstrings)?
There was a problem hiding this comment.
To a certain degree. If the function's signature (including name) fully explains what the function does, then I don't add the noise that a comment would now be. However, since the comment has to follow fewer rules than the name of the function, it's easy to add information not captured in the signature, which makes adding a comment worthwhile often enough.
All that said, I might misunderstand what you're getting at, since you are excluding docstrings from consideration. The type aliases are public, as should the description of a function's “what” be, meaning it should go in the docstring.
There was a problem hiding this comment.
Type aliases can have docstrings, and these particular docstrings can be used to document index conventions. This solution strikes me as better than the alternative, which is to document the relevant bits of the index conventions in the functions' docstrings. This alternative leads to a scattered convention documentation.
|
We should probably add a CI workflow (or modify an existing one) to check that compilation on 32-bit architectures works. The following is a starting point: rustup target add i686-unknown-linux-gnu
cargo clippy --target i686-unknown-linux-gnuI'm not familiar with all the pitfalls of cross compilation, but this seems like a starting point. Here's a list of all targets. Out of the Tier 1 targets, there are two 32-bit architectures: |
bffd0c6 to
24e555b
Compare
Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>
24e555b to
53b3ee2
Compare
- chore: Use correct type alias - style: Use standard strings for primitive type conversion error - docs: Explain impossible source of error Co-authored-by: Ferdinand Sauer <ferdinand@neptune.cash>
Specifically: kill `nodes()` and `num_nodes()`. If you want to get a specific node, use `merkle_tree.node(i)` instead of `merkle_tree.nodes()[i]`.
Drop the `const`s `MAX_NUM_NODES`, `MAX_NUM_LEAFS`, and `MAX_TREE_HEIGHT`. The consts supposedly guarantee cross-platform support. However: - It is unclear if any platform supports trees with 2^64 nodes, so the support guarantee is kind of vacuous. - Transferring Merkle trees between different machines is not a supported use case. Cf. #251. Co-authored-by: Ferdinand Sauer <ferdinand@neptune.cash>
a99850e to
63fa287
Compare
Co-authored-by: Ferdinand Sauer <ferdinand@neptune.cash>
459f305 to
5027e0a
Compare
Co-authored-by: Perplexity.ai
Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>
Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>
jan-ferdinand
left a comment
There was a problem hiding this comment.
Left a few nits inline. Generally 👍
| return Err(MerkleTreeError::TreeTooHigh); | ||
| } | ||
| Ok(1 << self.tree_height) | ||
| fn num_leafs(&self) -> Result<MerkleTreeLeafIndex> { |
There was a problem hiding this comment.
It feels a bit weird to use a type that is (in name) explicitly ordinal as a cardinal. Maybe usize is actually more appropriate here? Not totally sure.
There was a problem hiding this comment.
I don't understand. Please elaborate.
There was a problem hiding this comment.
To me, num_x implies the result be a cardinal number, but x_index sounds like an ordinal number. usize doesn't claim to be either, and so to me it feels like it can be used for both without triggering this feeling of dissonance.
There was a problem hiding this comment.
How is "index" ordinal? O_o
There was a problem hiding this comment.
It might be more a language thing than a math thing, but since this PR is primarily about names, that feels in scope. “In linguistics, ordinal numerals […] are words representing position or rank in a sequential order.” (source)
There was a problem hiding this comment.
Okay, so I learn. Thanks for the reference.
In my personal linguistic taste (which appears to be contrary to conventions well-established enough to grace wikipedia), an "index" is a string of data that enables you to locate some other data object according to some canonical lookup procedure, and so it is not necessarily either cardinal or ordinal in my book. That said, the dissonance persists: a number of leafs has little to do with instructions about how to find them, unless one assumes a contiguous storage in RAM. That assumption might be the reason why rust's usize gets to benefit from both ordinal and cardinal interpretations.
Making abstraction of that assumption is not an objective here. In other words, the knowledge that a MerkleTree contains nodes that are laid out contiguously in memory is implicit in every endpoint in the interface and that's okay. And so there is no need to communicate that abstraction through adequate naming and no potential for confusion (about that abstraction) through inadequate naming. So I do think it is linguistic dissonance only, and not a smell of something more profound.
Off the top of my head I can think of the following resolutions:
- Use
usizeinstead. Asymmetry eww. More importantly: if we decide thatMerkleTreeLeafIndexshould be an alias foru64instead (or some other type) then we probably want to modify the return type ofnum_leafsin lock-step, whereas this option suggests it is okay to break that link. - Add symmetric type alias
MerkleTreeLeafCount. Overkill. - Find a word or phrase that captures both a number of leafs and their indices and use that instead. I'm happy to entertain suggestions but I'm afraid any such phrase will increase confusion because unfamiliar.
- Keep
MerkleTreeLeafIndexand suffer the dissonance.
I'll leave this PR open for a day or two more in case you or anyone else wants to have a stab at the penultimate bullet point (or even convince me that one of the first two aren't so bad). Barring that, I'll proceed with the last and merge.
There was a problem hiding this comment.
Querying types like HashMap or str for their length results in a usize, even though they (generally) cannot be indexed with that type. To me, this indicates that (1) the first option is consistent with prominent rust uses like the standard library and (2) changing the index type does not imply that the return type for num_*() functions would change.
In any case, options 1, 3, and 4 all seem (to varying degrees) fine to me.
| pub enum MerkleTreeError { | ||
| #[error("All leaf indices must be valid, i.e., less than {num_leafs}.")] | ||
| LeafIndexInvalid { num_leafs: usize }, | ||
| LeafIndexInvalid { num_leafs: MerkleTreeLeafIndex }, |
There was a problem hiding this comment.
Same ordinal / cardinal thing.
Update twenty-first/src/util_types/merkle_tree.rs Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash> Update twenty-first/src/util_types/merkle_tree.rs Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash> Update twenty-first/src/util_types/merkle_tree.rs Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash> Update twenty-first/src/util_types/merkle_tree.rs Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash> Update twenty-first/src/util_types/merkle_tree.rs Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash> Update twenty-first/src/util_types/merkle_tree.rs Co-authored-by: Jan Ferdinand Sauer <ferdinand@neptune.cash>
f862aaa to
6220473
Compare
Co-authored-by: Ferdinand Sauer <ferdinand@neptune.cash>
55e5cf7 to
8c942b4
Compare
This PR introduces type aliases:
MerkleTreeNodeIndexforu64MerkleTreeLeafIndexforu64MerkleTreeHeightforu32.It also refactors
MerkleTreeto avoid usingusizewhere-everpossible, particularly in public functions. Instead, these types
aliases are used.
MerkleTreeinternally stores aVec<Digest>, which induces twolimitations:
usize, which complicates matters whentargeting compilation on 32-bit machines (including 32-bit WASM).
functions associated with Merkle trees to work for Merkle trees
that are far larger (without needing to store all the internal
nodes explicitly).
The second limitation persists. However, the first limitation is fixed
by this PR.
Also:
nodes()andleafs()now return iteratorsinstead of slices. As these functions were public, this change is
breaking.
authentication_structure_node_indicesismarked
pubas it is needed downstream.Displaces #250.